The Area under the ROC Curve as a Criterion for Clustering Evaluation
نویسندگان
چکیده
In the literature, there are several criteria for validation of a clustering partition. Those criteria can be external or internal, depending on whether we use prior information about the true class labels or only the data itself. All these criteria assume a fixed number of clusters k and measure the performance of a clustering algorithm for that k. Instead, we propose a measure that provides the robustness of an algorithm for several values of k, which constructs a ROC curve and measures the area under that curve. We present ROC curves of a few clustering algorithms for several synthetic and real-world datasets and show which clustering algorithms are less sensitive to the choice of the number of clusters, k. We also show that this measure can be used as a validation criterion in a semi-supervised context, and empirical evidence shows that we do not need always all the objects labeled to validate the clustering partition.
منابع مشابه
Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation
This review provides the basic principle and rational for ROC analysis of rating and continuous diagnostic test results versus a gold standard. Derived indexes of accuracy, in particular area under the curve (AUC) has a meaningful interpretation for disease classification from healthy subjects. The methods of estimate of AUC and its testing in single diagnostic test and also comparative studies...
متن کاملMaximizing the Area under the ROC Curve using Incremental Reduced Error Pruning
The use of incremental reduced error pruning for maximizing the area under the ROC curve (AUC) instead of accuracy is investigated. A commonly used accuracy-based exclusion criterion is shown to include rules that result in concave ROC curves as well as to exclude rules that result in convex ROC curves. A previously proposed exclusion criterion for unordered rule sets, based on the lift, is on ...
متن کاملUsing the mimetic technique to increase AUC. Some very preliminary ideas
The Area Under the ROC Curve (AUC) has been recognised as a very robust measure for classification evaluation. Recent efforts have focussed on modifying existing algorithms to improve their AUC or even to use AUC as a search criterion [6]. In this paper, we suggest the possibility that we could improve the AUC of existing models and techniques, without changing the techniques or retraining the ...
متن کاملCROC: A New Evaluation Criterion for Recommender Systems
Evaluation of a recommender system algorithm is a challenging task due to the many possible scenarios in which such systems may be deployed. We have designed a new performance plot called the CROC curve with an associated statistic: the area under the curve. Our CROC curve supplements the widely used ROC curve in recommender system evaluation by discovering performance characteristics that stan...
متن کاملA Large Deviation Bound for the Area Under an ROC Curve
The area under an ROC curve (AUC) has been advocated as an evaluation criterion for bipartite ranking problems. In this paper, we study large deviation properties of the AUC; in particular, we derive a distribution-free large deviation bound for the AUC which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on an independent test sequence.1 A comparison ...
متن کامل